Robust Model Selection for Classification of Microarrays
نویسندگان
چکیده
Recently, microarray-based cancer diagnosis systems have been increasingly investigated. However, cost reduction and reliability assurance of such diagnosis systems are still remaining problems in real clinical scenes. To reduce the cost, we need a supervised classifier involving the smallest number of genes, as long as the classifier is sufficiently reliable. To achieve a reliable classifier, we should assess candidate classifiers and select the best one. In the selection process of the best classifier, however, the assessment criterion must involve large variance because of limited number of samples and non-negligible observation noise. Therefore, even if a classifier with a very small number of genes exhibited the smallest leave-one-out cross-validation (LOO) error rate, it would not necessarily be reliable because classifiers based on a small number of genes tend to show large variance. We propose a robust model selection criterion, the min-max criterion, based on a resampling bootstrap simulation to assess the variance of estimation of classification error rates. We applied our assessment framework to four published real gene expression datasets and one synthetic dataset. We found that a state-of-the-art procedure, weighted voting classifiers with LOO criterion, had a non-negligible risk of selecting extremely poor classifiers and, on the other hand, that the new min-max criterion could eliminate that risk. These finding suggests that our criterion presents a safer procedure to design a practical cancer diagnosis system.
منابع مشابه
Primal and dual robust counterparts of uncertain linear programs: an application to portfolio selection
This paper proposes a family of robust counterpart for uncertain linear programs (LP) which is obtained for a general definition of the uncertainty region. The relationship between uncertainty sets using norm bod-ies and their corresponding robust counterparts defined by dual norms is presented. Those properties lead us to characterize primal and dual robust counterparts. The researchers show t...
متن کاملRobust portfolio selection with polyhedral ambiguous inputs
Ambiguity in the inputs of the models is typical especially in portfolio selection problem where the true distribution of random variables is usually unknown. Here we use robust optimization approach to address the ambiguity in conditional-value-at-risk minimization model. We obtain explicit models of the robust conditional-value-at-risk minimization for polyhedral and correlated polyhedral am...
متن کاملPalarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملPresentation of quasi-linear piecewise selected models simultaneously with designing of bump-less optimal robust controller for nonlinear vibration control of composite plates
The idea of using quasi-linear piecewise models has been established on the decomposition of complicated nonlinear systems, simultaneously designing with local controllers. Since the proper performance and the final system close loop stability are vital in multi-model controllers designing, the main problem in multi-model controllers is the number of the local models and their position not payi...
متن کاملA two-stage robust model for portfolio selection by using goal programming
In portfolio selection models, uncertainty plays an important role. The parameter’s uncertainty leads to getting away from optimal solution so it is needed to consider that in models. In this paper we presented a two-stage robust model that in first stage determines the desired percentage of investment in each industrial group by using return and risk measures from different industries. One rea...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2009